library_name: transformers
tags:
- code
- trl
- qwen2
- aether code
license: other
datasets:
- thesven/AetherCode-v1
language:
- en
Model Card for Aether-Qwen2-0.5B-SFT-v0.0.2-GPTQ
This repo contains a 4bit GPTQ quantization for the Aether-Qwen2-0.5-SFT-0.0.2 model.
This model is an iteration of the Qwen2 model, fine-tuned using Supervised Fine-Tuning (SFT) on the AetherCode-v1 dataset specifically for code-related tasks. It combines the advanced capabilities of the base Qwen2 model with specialized training to enhance its performance in software development contexts.
Model Details
Model Description
Aether-Qwen2-0.5B-SFT-v0.0.1 is a transformer model from the Hugging Face 🤗 transformers library, designed to facilitate and improve automated coding tasks. This model has been enhanced via Supervised Fine-Tuning (SFT) to better understand and generate code, making it ideal for applications in software development, code review, and automated programming assistance.
- Developed by: Michael Svendsen
- Finetuned from model: Qwen2 0.5B
Uses
Direct Use
This model is ready for direct use in environments where coding assistance is needed, providing capabilities such as code completion, error detection, and suggestions for code optimization.
Downstream Use [optional]
Further fine-tuning on specific coding languages or frameworks can extend its utility to more specialized software development tasks.
Out-of-Scope Use
The model should not be used for general natural language processing tasks outside the scope of programming and code analysis.
Bias, Risks, and Limitations
Users should be cautious about relying solely on the model for critical software development tasks without human oversight, due to potential biases in training data or limitations in understanding complex code contexts.
Recommendations
Ongoing validation and testing on diverse coding datasets are recommended to ensure the model remains effective and unbiased.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModel
model = AutoModel.from_pretrained("thesven/Aether-Qwen2-0.5B-SFT-v0.0.2-GTPQ")
or with a pipeline:
from transformers import pipeline
messages = [
{"role": "system", "content": "You are a helpful software development assistant"},
{"role": "user", "content": "can you write a python function that adds 3 numbers together?"},
]
pipe = pipeline("text-generation", model="thesven/Aether-Qwen2-0.5B-SFT-v0.0.2-GPTQ")
print(pipe(messages))
Prompt Template:
<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{assistant}
Training Details
Training Data
The model was trained using the 5star split from the AetherCode-v1 dataset, designed for enhancing coding-related AI capabilities.
Training Procedure
Training regime: The model was trained for 3 epochs on an RTX 4500 using Supervised Fine-Tuning (SFT)
Preprocessing [optional]
Standard preprocessing techniques were applied to prepare the code data for training.