|
--- |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
tags: |
|
- text generation |
|
programming_language: |
|
- Java |
|
- JavaScript |
|
- Python |
|
metrics: |
|
- code_eval |
|
inference: true |
|
widget: |
|
- text: 'def print_hello_world():' |
|
example_title: Hello world |
|
group: Python |
|
model-index: |
|
- name: DeciCoder-1b |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: nuprl/MultiPL-E |
|
name: MultiPL-HumanEval (Python) |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 0.191 |
|
verified: false |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: nuprl/MultiPL-E |
|
name: MultiPL-HumanEval (JavaScript) |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 0.184 |
|
verified: false |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: nuprl/MultiPL-E |
|
name: MultiPL-HumanEval (Java) |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 0.166 |
|
verified: false |
|
datasets: |
|
- bigcode/starcoderdata |
|
--- |
|
|
|
# Model Card for DeciCoder 1B |
|
|
|
DeciCoder 1B is a 1 billion parameter decoder-only code completion model |
|
trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata). |
|
The model uses Grouped Query Attention and has a context window of 2048 |
|
tokens. It was trained using a Fill-in-the-Middle training objective. The model's |
|
architecture was generated by Deci's proprietary Neural Architecture |
|
Search-based technology, AutoNAC. |
|
|
|
## Model Details |
|
|
|
- **Developed by:** Deci |
|
- **Model type:** DeciCoder is an auto-regressive language model based on the transformer decoder architecture, using Grouped Query Attention. |
|
- **Language(s):** Python, Java, JavaScript |
|
- **License:** Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
## Model Architecture |
|
|
|
| Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads | Hidden Size | |
|
|:----------|:----------|:----------|:----------|:----------|:----------| |
|
| 1.1B | 20 | 32 | 2048 | 4 | 2048 | | |
|
|
|
|
|
- **Decoder layer:** Grouped Query Attention [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245) |
|
- **Position Embeddings:** Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864) |
|
|
|
## Uses |
|
|
|
The model is intended to do single/multiline code completion from a |
|
context window of up to 2048k tokens. It is *not* an instruction model |
|
and commands like \"Write a function that computes the absolute value of |
|
an integer,\" won't yield the desired results. A more effective approach |
|
is to frame instructions in the style of source code comments (e.g. \# |
|
this function calculates the absolute value of an integer) or to present |
|
a function signature and docstring, enabling the model to complete the |
|
function's body. |
|
|
|
### How to Use |
|
|
|
```bibtex |
|
# pip install -q transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
checkpoint = "Deci/DeciCoder-1b" |
|
device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device) |
|
|
|
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device) |
|
outputs = model.generate(inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
### Attribution |
|
|
|
DeciCoder was trained on StarCoder Training Dataset, filtered for |
|
Python, Java, and Javascript code. For additional information, please |
|
refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata). |
|
|
|
### Limitations |
|
|
|
The model has undergone training with source code from Python, Java, and |
|
JavaScript. While the primary language in the source is English, it does |
|
contain other languages. Therefore, the model can produce code snippets |
|
given some context. However, there\'s no assurance that the resulting |
|
code will function as expected. It might be suboptimal, contain bugs, or |
|
even exploits. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
DeciCoder was trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata) |
|
|
|
|
|
### Training Procedure |
|
|
|
- **Warm-Up Steps**: 9000 |
|
- **Total Training Steps**: 284k |
|
- **Total Tokens**: 446B |
|
- **Global Batch Size**: 768 |
|
- **Optimizer**: AdamW |
|
- **Optimizer Parameters**: beta1=0.9, beta2=0.95 |
|
- **Weight Decay**: 0.1 |
|
- **Learning Rate**: 4e-4 |
|
- **Learning Rate Schedule**: cosine |
|
|
|
## Evaluation |
|
|
|
Below are DeciCoder's pass@1 on MultiPL HumanEval scores |
|
|
|
| Python | JavaScript | Java | |
|
|:----------|:----------|:----------| |
|
| 19.1% | 18.4% | 16.6% | |
|
|
|
|
|
### Runtime Benchmarks |
|
|
|
|Inference Tool/Hardware | A10 (tokens/sec) |A100 (tokens/sec) | |
|
|:----------|:----------|:----------| |
|
| HF Inference Endpoints | 1,364.2 | 3,244.4 | |
|
| Infery LLM | 3,889.3 | 11,676.8 | |
|
|
|
- Throughput (tokens/sec) - Measured with optimal batch size per hardware - A10 on BS 128, A100 on BS 512 |
|
|
|
## Documentation |
|
|
|
- [Notebook](https://colab.research.google.com/drive/1JCxvBsWCZKHfIcHSMVf7GZCs3ClMQPjs) |
|
- Blog post: [Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation](https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llm/) |
|
- Questions:Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/) |
|
|
|
## How to Cite |
|
|
|
Please cite this model using this format. |
|
|
|
```bibtex |
|
@misc{DeciFoundationModels, |
|
title = {DeciCoder}, |
|
author = {DeciAI Research Team}, |
|
year = {2023} |
|
url={[https://huggingface.co/deci/decicoder-1b](https://huggingface.co/deci/decicoder-1b)}, |
|
} |
|
``` |