Vietnamese Legal Reasoning Model - GRPO Fine-tuned

🏛️ Model Description

This model is a Vietnamese legal reasoning specialist fine-tuned using Group Relative Policy Optimization (GRPO) on Vietnamese legal question-answering data. It's specifically designed to perform syllogistic reasoning for Vietnamese legal scenarios.

🎯 Base Model

🔥 Key Features

Syllogistic Reasoning: Structured legal arguments (Major Premise → Minor Premise → Conclusion)
Vietnamese Legal Domain: Trained on Vietnamese legal texts and Q&A
GRPO Optimization: Advanced policy optimization for better reasoning
Citation Support: Generates responses with legal citations
Structured Output: Uses XML-like tags for organized responses

📊 Model Architecture

  • Parameters: ~1.7B
  • Vocabulary Size: 151936
  • Hidden Size: 2048
  • Layers: 28
  • Attention Heads: 16

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "thangvip/qwen3-1.7b-vietnamese-legal-grpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Format your legal question
system_prompt = """Bạn là một chuyên gia pháp lý. Hãy trả lời câu hỏi bằng cách sử dụng phương pháp lập luận tam đoạn luận (syllogism).

Trước tiên, hãy suy nghĩ về vấn đề trong thẻ <think></think>.

Sau đó, trả lời theo định dạng sau:
<answer>
<major_premise>[Quy định pháp luật chung]</major_premise>
<minor_premise>[Sự kiện cụ thể trong câu hỏi]</minor_premise>
<conclusion>[Áp dụng quy định vào sự kiện để đưa ra kết luận]</conclusion>
</answer>

Hãy đảm bảo trích dẫn chính xác các điều luật liên quan."""

question = "Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"

# Create conversation
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# Generate response
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Pipeline Usage

from transformers import pipeline

# Create text generation pipeline
generator = pipeline(
    "text-generation",
    model="thangvip/qwen3-1.7b-vietnamese-legal-grpo",
    tokenizer="thangvip/qwen3-1.7b-vietnamese-legal-grpo",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate legal reasoning
prompt = "Câu hỏi: Quyền và nghĩa vụ của người thuê nhà khi hợp đồng thuê hết hạn?"
result = generator(prompt, max_new_tokens=512, temperature=0.7)
print(result[0]['generated_text'])

🎯 Training Details

Training Procedure

  • Method: Group Relative Policy Optimization (GRPO)
  • Base Model: thangvip/qwen3-1.7b-legal-pretrain-synthetic-8k
  • Training Steps: N/A
  • Learning Rate: N/A
  • Batch Size: N/A

Training Data

  • Domain: Vietnamese legal question-answering
  • Format: Syllogistic reasoning pairs
  • Structure: Question → Structured legal reasoning response

Reward System

The model was trained with a sophisticated reward system:

  • Correctness (40%): Factual accuracy against reference answers
  • Format Compliance (25%): Proper use of syllogistic structure
  • Citation Accuracy (15%): Relevant and accurate legal citations
  • Reasoning Quality (15%): Quality of legal reasoning process
  • Hallucination Penalty (5%): Penalty for unsupported claims

📝 Expected Output Format

The model generates structured responses in this format:

<think>
[Internal reasoning about the legal question]
</think>

<answer>
<major_premise>
[General legal rule or principle applicable to the situation]
</major_premise>

<minor_premise>
[Specific facts from the question that relate to the legal rule]
</minor_premise>

<conclusion>
[Legal conclusion that follows logically from applying the rule to the facts]
</conclusion>
</answer>

🎯 Use Cases

  • Legal Education: Teaching legal reasoning methodology
  • Legal Research: Preliminary analysis of legal questions
  • Document Drafting: Structured legal argument generation
  • Legal Consultation: Initial legal guidance (with human review)

⚠️ Limitations

  • Domain Specific: Optimized for Vietnamese legal context
  • Educational Purpose: Should not replace professional legal advice
  • Fact Checking Required: Always verify legal citations and conclusions
  • Context Window: Limited by base model's context length

📄 Citation

If you use this model, please cite:

@misc{vietnamese-legal-grpo-2024,
  title={Vietnamese Legal Reasoning Model with GRPO},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/thangvip/qwen3-1.7b-vietnamese-legal-grpo}
}

🤝 Contributing

Contributions are welcome! Please see our contributing guidelines.

📜 License

This model is released under the Apache 2.0 License.

🙏 Acknowledgments

  • TRL Team: For the GRPO implementation
  • Qwen Team: For the excellent base model
  • Hugging Face: For the transformers library and model hosting

Note: This model is for educational and research purposes. Always consult qualified legal professionals for actual legal advice.

Downloads last month
33
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thangvip/qwen3-1.7b-vietnamese-legal-grpo