InLawMate-peft: Indian Legal Domain PEFT Model

Model Description

InLawMate-peft is a Parameter-Efficient Fine-Tuned (PEFT) language model specifically optimized for understanding and reasoning about Indian legal documentation. The model was trained on a carefully curated dataset of nearly 7,000 question-answer pairs derived from Indian criminal law documentation, making it particularly adept at legal comprehension and explanation tasks.

Training Data

The training data consists of nearly 7,000 high-quality legal Q&A pairs that were systematically generated using a sophisticated two-stage process:

  1. Question Generation: Questions were extracted to cover key legal concepts, definitions, procedures, and roles, ensuring comprehensive coverage of:

    • Legal terminology and definitions
    • Procedural rules and steps
    • Rights and penalties
    • Jurisdictional aspects
    • Roles of legal entities (judges, lawyers, law enforcement)
  2. Answer Generation: Answers were crafted following a structured legal reasoning approach, ensuring:

    • Legal precision and accuracy
    • Comprehensive coverage of relevant points
    • Clear explanation of legal concepts
    • Professional legal discourse style

Training Details

  • Base Model: allenai/Llama-3.1-Tulu-3-8B
  • Architecture: PEFT (Parameter-Efficient Fine-Tuning)
  • Training Epochs: 3
  • Batch Size: 2 (with gradient accumulation steps of 4)
  • Learning Rate: 3e-05 with cosine scheduler
  • Sequence Length: 1024 tokens
  • Mixed Precision: BF16
  • Optimization: AdamW with β1=0.9, β2=0.999

Use Cases

This model is particularly suited for:

  • Legal document analysis and comprehension
  • Answering questions about Indian criminal law
  • Understanding legal procedures and requirements
  • Explaining legal concepts and terminology
  • Assisting in legal research and education

Limitations

  • The model is specifically trained on Indian legal documentation
  • Responses should be verified by legal professionals for critical applications
  • The model should not be used as a substitute for professional legal advice

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "aryaman/legalpara-lm",
    device_map="auto",
    torch_dtype='auto'
).eval()

tokenizer = AutoTokenizer.from_pretrained("Aryaman02/InLawMate-peft")

# Example legal query
messages = [
    {"role": "user", "content": "What are the requirements for cross-examination according to Indian law?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors='pt'
)
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Citation

If you use this model in your research, please cite:

@misc{legalpara-lm,
  title={InLawMate: A PEFT Model for Indian Legal Domain Understanding},
  year={2024},
  publisher={Aryaman},
  note={Model trained on Indian legal documentation}
}

Our training data and procedure for synth data creation is outlined in https://github.com/DarryCrucian/law-llm

Downloads last month
39
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Aryaman02/InLawMate-peft