Gemma-2-2B Korean Law Fine-tuned Model

Model Description

This model is a fine-tuned version of google/gemma-2-2b specifically trained on Korean legal documents and statutes. It has been optimized to understand and generate responses related to Korean law, legal terminology, and judicial concepts.

Model Details

  • Base Model: google/gemma-2-2b
  • Model Size: 2.6B parameters
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Language: Korean (한국어)
  • Domain: Legal/Law (법률)
  • License: Gemma License

Training Details

Training Data

  • Korean legal statutes and regulations
  • Legal document corpus including civil law, criminal law, commercial law, and constitutional law
  • Preprocessed and formatted in instruction-following format

Training Configuration

  • LoRA Rank: 8
  • LoRA Alpha: 16
  • Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  • Learning Rate: 1e-5
  • Batch Size: 2 (with gradient accumulation steps of 8)
  • Epochs: 3
  • Optimizer: AdamW
  • Scheduler: Linear with warmup
  • Max Sequence Length: 512

Training Infrastructure

  • Framework: PyTorch + Transformers + PEFT
  • Hardware: NVIDIA T4
  • Platform Kaggle
  • Precision: Float16
  • Attention Implementation: Eager (optimized for Gemma2)

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
model_name = "your-username/gemma-2-2b-korean-law"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation='eager'
)

Inference Example

def generate_legal_response(prompt, max_length=256):
    # Format the prompt
    formatted_prompt = f"### 질문: {prompt}\n\n### 답변:"
    
    # Tokenize
    inputs = tokenizer.encode(formatted_prompt, return_tensors='pt')
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=len(inputs[0]) + max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Decode and return
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### 답변:")[-1].strip()

# Example usage
question = "민법에서 계약의 성립 요건에 대해 설명해주세요."
answer = generate_legal_response(question)
print(answer)

Prompt Format

The model works best with the following prompt format:

### 질문: [Your legal question in Korean]

### 답변: [Model's response will be generated here]

Performance

The model has been trained to:

  • Understand Korean legal terminology and concepts
  • Provide explanations of legal statutes and regulations
  • Interpret legal provisions in plain Korean
  • Answer questions about Korean civil, criminal, commercial, and constitutional law

Note: This model is for educational and informational purposes only. It should not be used as a substitute for professional legal advice.

Limitations

  • Not Legal Advice: Responses should not be considered as professional legal advice
  • Training Data Cutoff: Knowledge is limited to the training data and may not reflect recent legal changes
  • Hallucination: Like all language models, it may occasionally generate incorrect or misleading information
  • Scope: Primarily focused on Korean law; may not be accurate for other legal systems

Ethical Considerations

  • This model should be used responsibly and ethically
  • Users should verify important legal information with qualified legal professionals
  • The model may reflect biases present in the training data
  • Not intended for making actual legal decisions or providing official legal counsel

Citation

If you use this model in your research or applications, please cite:

@model{Kolaw-1.5,
  title={Kolaw-1.5},
  author={[Rootpye]},
  year={2025},
  publisher={Rootpye},
  url={https://huggingface.co/Rootpye/Kolaw-1.5}
}

Acknowledgments

  • Google for the base Gemma-2-2B model
  • Hugging Face for the transformers library and model hosting
  • The open-source community for PEFT and LoRA implementations

Contact

For questions, issues, or collaborations, please:


Disclaimer: This model is provided "as is" without warranties. Users are responsible for ensuring compliance with applicable laws and regulations when using this model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rootpye/Kolaw-1.5

Base model

google/gemma-2-2b
Adapter
(129)
this model

Datasets used to train Rootpye/Kolaw-1.5