InLawMate-peft: Indian Legal Domain PEFT Model

Model Description

InLawMate-peft is a Parameter-Efficient Fine-Tuned (PEFT) language model specifically optimized for understanding and reasoning about Indian legal documentation. The model was trained on a carefully curated dataset of nearly 7,000 question-answer pairs derived from Indian criminal law documentation, making it particularly adept at legal comprehension and explanation tasks.

Training Data

The training data consists of nearly 7,000 high-quality legal Q&A pairs that were systematically generated using a sophisticated two-stage process:

Question Generation: Questions were extracted to cover key legal concepts, definitions, procedures, and roles, ensuring comprehensive coverage of:
- Legal terminology and definitions
- Procedural rules and steps
- Rights and penalties
- Jurisdictional aspects
- Roles of legal entities (judges, lawyers, law enforcement)
Answer Generation: Answers were crafted following a structured legal reasoning approach, ensuring:
- Legal precision and accuracy
- Comprehensive coverage of relevant points
- Clear explanation of legal concepts
- Professional legal discourse style

Training Details

Base Model: allenai/Llama-3.1-Tulu-3-8B
Architecture: PEFT (Parameter-Efficient Fine-Tuning)
Training Epochs: 3
Batch Size: 2 (with gradient accumulation steps of 4)
Learning Rate: 3e-05 with cosine scheduler
Sequence Length: 1024 tokens
Mixed Precision: BF16
Optimization: AdamW with β1=0.9, β2=0.999

Use Cases

This model is particularly suited for:

Legal document analysis and comprehension
Answering questions about Indian criminal law
Understanding legal procedures and requirements
Explaining legal concepts and terminology
Assisting in legal research and education

Limitations

The model is specifically trained on Indian legal documentation
Responses should be verified by legal professionals for critical applications
The model should not be used as a substitute for professional legal advice

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "aryaman/legalpara-lm",
    device_map="auto",
    torch_dtype='auto'
).eval()

tokenizer = AutoTokenizer.from_pretrained("Aryaman02/InLawMate-peft")

# Example legal query
messages = [
    {"role": "user", "content": "What are the requirements for cross-examination according to Indian law?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors='pt'
)
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Citation

If you use this model in your research, please cite:

@misc{legalpara-lm,
  title={InLawMate: A PEFT Model for Indian Legal Domain Understanding},
  year={2024},
  publisher={Aryaman},
  note={Model trained on Indian legal documentation}
}

Our training data and procedure for synth data creation is outlined in https://github.com/DarryCrucian/law-llm

Aryaman02
/

InLawMate-peft