SGEcon's picture
Update README.md
6bb0093 verified
|
raw
history blame
5.35 kB
metadata
library_name: transformers
license: apache-2.0

Model Details

Model Developers: Sogang University SGEconFinlab(<https://sc.sogang.ac.kr/aifinlab/)

Model Description

This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data. The data sources are listed below, and we are not releasing the data we trained on because it was used for research/policy purposes. If you wish to use the original data rather than our training data, please contact the original author directly for permission to use it.

How to Get Started with the Model

peft_model_id = "SGEcon/KoSOLAR-10.7B-v0.2_fin_v4"
config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map={"":0})
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model.eval()

import re
def gen(x):
    inputs = tokenizer(f"### ์งˆ๋ฌธ: {x}\n\n### ๋‹ต๋ณ€:", return_tensors='pt', return_token_type_ids=False)

    # Move data to GPU (if available)
    inputs = {k: v.to(device="cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}

    gened = model.generate(
        **inputs,
        max_new_tokens=256,
        early_stopping=True,
        num_return_sequences=4,  
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,  
        temperature=0.9,
        top_p=0.8,
        top_k=50
    )

    complete_answers = []
    for gen_seq in gened:
        decoded = tokenizer.decode(gen_seq, skip_special_tokens=True).strip()

        # Extract only the text after the string "### ๋‹ต๋ณ€:"
        first_answer_start_idx = decoded.find("### ๋‹ต๋ณ€:") + len("### ๋‹ต๋ณ€:")
        temp_answer = decoded[first_answer_start_idx:].strip()

        # Extract only text up to the second "### ๋‹ต๋ณ€:" string
        second_answer_start_idx = temp_answer.find("### ๋‹ต๋ณ€:")
        if second_answer_start_idx != -1:
            complete_answer = temp_answer[:second_answer_start_idx].strip()
        else:
            complete_answer = temp_answer  # ๋‘ ๋ฒˆ์งธ "### ๋‹ต๋ณ€:"์ด ์—†๋Š” ๊ฒฝ์šฐ ์ „์ฒด ๋‹ต๋ณ€ ๋ฐ˜ํ™˜
    
        complete_answers.append(complete_answer)

    return complete_answers

Training Details

First, we loaded the base model quantized to 4 bits. It can significantly reduce the amount of memory required to store the model's weights and intermediate computation results, which is beneficial for deploying models in environments with limited memory resources. It can also provide faster inference speeds. Then,

Training Data

  1. ํ•œ๊ตญ์€ํ–‰: ๊ฒฝ์ œ๊ธˆ์œต์šฉ์–ด 700์„ (https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765)
  2. ๊ธˆ์œต๊ฐ๋…์›: ๊ธˆ์œต์†Œ๋น„์ž ์ •๋ณด ํฌํ„ธ ํŒŒ์ธ ๊ธˆ์œต์šฉ์–ด์‚ฌ์ „(https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021)
  3. KDI ๊ฒฝ์ œ์ •๋ณด์„ผํ„ฐ: ์‹œ์‚ฌ ์šฉ์–ด์‚ฌ์ „(https://eiec.kdi.re.kr/material/wordDic.do)
  4. ํ•œ๊ตญ๊ฒฝ์ œ์‹ ๋ฌธ/ํ•œ๊ฒฝ๋‹ท์ปด: ํ•œ๊ฒฝ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „(https://terms.naver.com/list.naver?cid=42107&categoryId=42107), ์˜ค๋Š˜์˜ TESAT(https://www.tesat.or.kr/bbs.frm.list/tesat_study?s_cateno=1), ์˜ค๋Š˜์˜ ์ฃผ๋‹ˆ์–ด TESAT(https://www.tesat.or.kr/bbs.frm.list/tesat_study?s_cateno=5), ์ƒ๊ธ€์ƒ๊ธ€ํ•œ๊ฒฝ(https://sgsg.hankyung.com/tesat/study)
  5. ์ค‘์†Œ๋ฒค์ฒ˜๊ธฐ์—…๋ถ€/๋Œ€ํ•œ๋ฏผ๊ตญ์ •๋ถ€: ์ค‘์†Œ๋ฒค์ฒ˜๊ธฐ์—…๋ถ€ ์ „๋ฌธ์šฉ์–ด(https://terms.naver.com/list.naver?cid=42103&categoryId=42103)
  6. ๊ณ ์„ฑ์‚ผ/๋ฒ•๋ฌธ์ถœํŒ์‚ฌ: ํšŒ๊ณ„ยท์„ธ๋ฌด ์šฉ์–ด์‚ฌ์ „(https://terms.naver.com/list.naver?cid=51737&categoryId=51737)
  7. ๋งจํ์˜ ๊ฒฝ์ œํ•™ 8ํŒ Word Index
  8. yanolja/KoSOLAR-10.7B-v0.2(<yanolja/KoSOLAR-10.7B-v0.2>)

Training Procedure

Training Hyperparameters

Hyperparameter SGEcon/KoSOLAR-10.7B-v0.2_fin_v4
Lora Method Lora
load in 4 bit True
learning rate 1e-5
lr scheduler linear
lora alpa 16
lora rank 16
lora dropout 0.05
optim paged_adamw_32bit
target_modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Results

[More Information Needed]

Summary

Citation [optional]