Gemma-2-2B Korean Law Fine-tuned Model

Model Description

This model is a fine-tuned version of google/gemma-2-2b specifically trained on Korean legal documents and statutes. It has been optimized to understand and generate responses related to Korean law, legal terminology, and judicial concepts.

Model Details

Base Model: google/gemma-2-2b
Model Size: 2.6B parameters
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Language: Korean (한국어)
Domain: Legal/Law (법률)
License: Gemma License

Training Details

Training Data

Korean legal statutes and regulations
Legal document corpus including civil law, criminal law, commercial law, and constitutional law
Preprocessed and formatted in instruction-following format

Training Configuration

LoRA Rank: 8
LoRA Alpha: 16
Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Learning Rate: 1e-5
Batch Size: 2 (with gradient accumulation steps of 8)
Epochs: 3
Optimizer: AdamW
Scheduler: Linear with warmup
Max Sequence Length: 512

Training Infrastructure

Framework: PyTorch + Transformers + PEFT
Hardware: NVIDIA T4
Platform Kaggle
Precision: Float16
Attention Implementation: Eager (optimized for Gemma2)

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
model_name = "your-username/gemma-2-2b-korean-law"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation='eager'
)

Inference Example

def generate_legal_response(prompt, max_length=256):
    # Format the prompt
    formatted_prompt = f"### 질문: {prompt}\n\n### 답변:"
    
    # Tokenize
    inputs = tokenizer.encode(formatted_prompt, return_tensors='pt')
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=len(inputs[0]) + max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Decode and return
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### 답변:")[-1].strip()

# Example usage
question = "민법에서 계약의 성립 요건에 대해 설명해주세요."
answer = generate_legal_response(question)
print(answer)

Prompt Format

The model works best with the following prompt format:

### 질문: [Your legal question in Korean]

### 답변: [Model's response will be generated here]

Performance

The model has been trained to:

Understand Korean legal terminology and concepts
Provide explanations of legal statutes and regulations
Interpret legal provisions in plain Korean
Answer questions about Korean civil, criminal, commercial, and constitutional law

Note: This model is for educational and informational purposes only. It should not be used as a substitute for professional legal advice.

Limitations

Not Legal Advice: Responses should not be considered as professional legal advice
Training Data Cutoff: Knowledge is limited to the training data and may not reflect recent legal changes
Hallucination: Like all language models, it may occasionally generate incorrect or misleading information
Scope: Primarily focused on Korean law; may not be accurate for other legal systems

Ethical Considerations

This model should be used responsibly and ethically
Users should verify important legal information with qualified legal professionals
The model may reflect biases present in the training data
Not intended for making actual legal decisions or providing official legal counsel

Citation

If you use this model in your research or applications, please cite:

@model{Kolaw-1.5,
  title={Kolaw-1.5},
  author={[Rootpye]},
  year={2025},
  publisher={Rootpye},
  url={https://huggingface.co/Rootpye/Kolaw-1.5}
}

Acknowledgments

Google for the base Gemma-2-2B model
Hugging Face for the transformers library and model hosting
The open-source community for PEFT and LoRA implementations

Contact

For questions, issues, or collaborations, please:

Open an issue on this model's repository
Contact: [roootpi@gmail.com]

Disclaimer: This model is provided "as is" without warranties. Users are responsible for ensuring compliance with applicable laws and regulations when using this model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Rootpye/Kolaw-1.5

Base model

google/gemma-2-2b

Adapter

(129)

this model

Rootpye
/

Kolaw-1.5