README.md · SGEcon/EconFinKoSOLAR-10.7B at b3e87a36228b429e756e2300b3529aec26b491b7

metadata

library_name: transformers
tags:
  - finance
  - economic
license: cc-by-nc-4.0
datasets:
  - mncai/orca_dpo_pairs_ko
language:
  - ko
  - en
Basemodel: SGEcon/KoSOLAR-10.7B-v0.2_fin_v4

Model Details

Model Developers: Sogang University SGEconFinlab(<https://sc.sogang.ac.kr/aifinlab/)

Model Description

This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data. The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes. If you wish to use the original data, please contact the original author directly for permission to use it.

Developed by: Sogang University SGEconFinlab(https://sc.sogang.ac.kr/aifinlab/)
License: cc-by-nc-4.0
Base Model: SGEcon/KoSOLAR-10.7B-v0.2_fin_v4(https://huggingface.co/SGEcon/KoSOLAR-10.7B-v0.2_fin_v4)

Loading the Model

peft_model_id = "SGEcon/KoSOLAR-10.7B-v0.2_fin_v4_dpo"
config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map={"":0})
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model.eval()

Conducting Conversation

import re

def gen(x):
    inputs = tokenizer(f"### 질문: {x}\n\n### 답변:", return_tensors='pt', return_token_type_ids=False)

    # Move data to GPU (if available)
    inputs = {k: v.to(device="cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}

    gened = model.generate(
        **inputs,
        max_new_tokens=256,  # Maximum number of new tokens to create
        early_stopping=True,
        num_return_sequences=1,  # Generate only one answer
        do_sample=True,  # Enable sampling to generate a variety of answers
        eos_token_id=tokenizer.eos_token_id,  # Using EOS Token IDs 
        temperature=0.9,  # This option is adjustable.
        top_p=0.8,  # This option is adjustable.
        top_k=100  # This option is adjustable.
    )

    # Decode the generated sequence and convert it to output text 
    decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()

    # Extract only text after a string "### 답변:" 
    answer_start_idx = decoded.find("### 답변:") + len("### 답변:")
    complete_answer = decoded[answer_start_idx:].strip()

    # Find the first punctuation mark (. ? !) and extract only up to it
    match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
    if match:
        complete_answer = complete_answer[:match.end()].strip()

    return complete_answer

Training Details

Training our model with PEFT, LoRA, DPO and Merge.

Low-Rank Adaptation (LoRA) fixes the weights of the pretrained model and attaches learnable rank decomposition matrices to each layer of the transformer, updating only these when finetuning. In other words, LoRA is a methodology that uses low-dimensional intrinsic rank (the number of dimensions that best describe the data for a given layer or parameter) for finetuning.
PEFT is a technique that does not tune all parameters of a model during fine-tuning, but only a small subset of parameters. By tuning only a few parameters while leaving others fixed, the model is less likely to suffer from catastrophic forgetting, where the model forgets previously learned tasks when it learns new ones. By tuning only a few parameters, models can be trained for different tasks such as QA, Summarize, and Generate PEFT.
Direct Preference Optimization (DPO) is an alternative to Reinforcement Learning from Human Feedback (RLHF). RLHF creates a reward function with human-selected data from multiple LLMs' answers to the same question, and then performs reinforcement learning on the reward function to improve model performance. DPO also uses preference data but trains directly without a reward function. We selected relatively important data from the data learned by the base model, asked the base model, and extracted four answers. All 4 answers were rejected, and the original answer was selected to create the dpo dataset. Then, combine our dataset and mncai/orca_dpo_pairs_ko dataset which published on huggingface.
Merge is a way to mix two or more models into a single model. Because merge is not training, it has the advantage of being very fast, requiring only CPU computation.

Training Data

our dpo dataset

It is not to be used for commercial purposes. Therefore, it is licensed under the license CC-BY-NC-4.0.

mncai/orca_dpo_pairs_ko(https://huggingface.co/datasets/mncai/orca_dpo_pairs_ko)

Training Hyperparameters

Hyperparameter	SGEcon/KoSOLAR-10.7B-v0.2_fin_v4_dpo
Lora Method	Lora
load in 4 bit	True
learning rate	1e-5
lr scheduler	cosine
lora alpa	8
lora rank	32
lora dropout	0.05
optim	adamw_torch
target_modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head

License

The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.

Example

중앙은행의 역할에 대해서 설명해줄래?

우선 중앙은행이 뭐하는 곳인지 설명할게. 중앙은행은 정부와 경제주체 사이의 중재자 같은 역할을 하는 곳이야. 중앙은행은 통화정책, 금융시스템의 안전을 담당하고 있고, 법정통화를 발행하는 곳이야. 중앙은행이 통화정책을 통해 경제성장을 지원하고, 물가를 조절해 경제를 관리하는거야. 중앙은행이 돈을 너무 많이 풀면 물가가 오르고, 돈을 너무 많이 걷으면 경제가 침체되는데, 중앙은행은 경제가 안정될 수 있게 끔 통화량을 조절하는거야. 중앙은행은 금융시스템의 안전을 담당하고 있어. 만약 어떤 은행이 대출을 너무 많이 해서 위기에 처한다면, 중앙은행은 은행에 돈을 줘 은행을 지원해서 금융시스템의 안전을 지킬 수 있게 해. 마지막으로 중앙은행은 법정통화를 발행하는 곳이야. 법정통화는 정부가 지정한 화폐로, 정부가 보증하는 통화라는거야. 법정통화는 우리가 많이 사용하는 화폐인데, 중앙은행은 법정통화를 발행하고 유통시켜 경제가 원활하게 돌아갈 수 있게 해. 이렇게 중앙은행은 경제를 관리하는 역할을 하고, 통화정책, 금융시스템의 안전을 담당하고, 법정통화를 발행해서 경제가 원활하게 돌아갈 수 있게 해.