Gemma-2-2B Korean Law Fine-tuned Model
Model Description
This model is a fine-tuned version of google/gemma-2-2b specifically trained on Korean legal documents and statutes. It has been optimized to understand and generate responses related to Korean law, legal terminology, and judicial concepts.
Model Details
- Base Model: google/gemma-2-2b
- Model Size: 2.6B parameters
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Language: Korean (한국어)
- Domain: Legal/Law (법률)
- License: Gemma License
Training Details
Training Data
- Korean legal statutes and regulations
- Legal document corpus including civil law, criminal law, commercial law, and constitutional law
- Preprocessed and formatted in instruction-following format
Training Configuration
- LoRA Rank: 8
- LoRA Alpha: 16
- Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Learning Rate: 1e-5
- Batch Size: 2 (with gradient accumulation steps of 8)
- Epochs: 3
- Optimizer: AdamW
- Scheduler: Linear with warmup
- Max Sequence Length: 512
Training Infrastructure
- Framework: PyTorch + Transformers + PEFT
- Hardware: NVIDIA T4
- Platform Kaggle
- Precision: Float16
- Attention Implementation: Eager (optimized for Gemma2)
Usage
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model and tokenizer
model_name = "your-username/gemma-2-2b-korean-law"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
attn_implementation='eager'
)
Inference Example
def generate_legal_response(prompt, max_length=256):
# Format the prompt
formatted_prompt = f"### 질문: {prompt}\n\n### 답변:"
# Tokenize
inputs = tokenizer.encode(formatted_prompt, return_tensors='pt')
# Generate
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=len(inputs[0]) + max_length,
num_return_sequences=1,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode and return
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("### 답변:")[-1].strip()
# Example usage
question = "민법에서 계약의 성립 요건에 대해 설명해주세요."
answer = generate_legal_response(question)
print(answer)
Prompt Format
The model works best with the following prompt format:
### 질문: [Your legal question in Korean]
### 답변: [Model's response will be generated here]
Performance
The model has been trained to:
- Understand Korean legal terminology and concepts
- Provide explanations of legal statutes and regulations
- Interpret legal provisions in plain Korean
- Answer questions about Korean civil, criminal, commercial, and constitutional law
Note: This model is for educational and informational purposes only. It should not be used as a substitute for professional legal advice.
Limitations
- Not Legal Advice: Responses should not be considered as professional legal advice
- Training Data Cutoff: Knowledge is limited to the training data and may not reflect recent legal changes
- Hallucination: Like all language models, it may occasionally generate incorrect or misleading information
- Scope: Primarily focused on Korean law; may not be accurate for other legal systems
Ethical Considerations
- This model should be used responsibly and ethically
- Users should verify important legal information with qualified legal professionals
- The model may reflect biases present in the training data
- Not intended for making actual legal decisions or providing official legal counsel
Citation
If you use this model in your research or applications, please cite:
@model{Kolaw-1.5,
title={Kolaw-1.5},
author={[Rootpye]},
year={2025},
publisher={Rootpye},
url={https://huggingface.co/Rootpye/Kolaw-1.5}
}
Acknowledgments
- Google for the base Gemma-2-2B model
- Hugging Face for the transformers library and model hosting
- The open-source community for PEFT and LoRA implementations
Contact
For questions, issues, or collaborations, please:
- Open an issue on this model's repository
- Contact: [roootpi@gmail.com]
Disclaimer: This model is provided "as is" without warranties. Users are responsible for ensuring compliance with applicable laws and regulations when using this model.
Model tree for Rootpye/Kolaw-1.5
Base model
google/gemma-2-2b