codelion
/

scorelora

Model card Files Files and versions Community

scorelora / README.md

codelion's picture

Update README.md

dc78f1f verified about 2 months ago

|

history blame contribute delete

1.66 kB

	---
	base_model: unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
	library_name: peft
	---

	# SCoReLoRA: Self-Correct via Reinforcement Learning

	SCoReLoRA is an innovative approach to fine-tuning language models using Low-Rank Adaptation (LoRA) combined with reinforcement learning techniques for self-correction. This method aims to improve the model's ability to generate more accurate and refined responses through a two-stage training process.

	## Features

	- Implements a two-stage training process for self-correction
	- Utilizes reinforcement learning to improve model outputs
	- Compatible with Hugging Face's Transformers library and PEFT
	- Supports quantized models for efficient fine-tuning
	- Includes evaluation metrics for self-correction performance

	## How It Works

	SCoreLora uses a two-stage training process:

	1. Stage I: The model is trained to generate initial responses and then correct them, minimizing the KL divergence between the base model and the fine-tuned model.

	2. Stage II: The model is further trained using reinforcement learning techniques, with rewards based on the quality of self-corrections.

	The training process utilizes shaped rewards and KL divergence to balance between improvement and staying close to the original model's behavior.

	## Evaluation

	The implementation includes functions to evaluate the model's self-correction capabilities, measuring metrics such as:

	- Accuracy before and after correction
	- Improvement rate
	- Rate of successful corrections
	- Rate of erroneous corrections

	## Reference

	- [Training Language Models to Self-Correct via Reinforcement Learning](https://arxiv.org/abs/2409.12917)