Akshara-2B-Hindi Language Model

Overview

Akshara-2B-Hindi is a powerful language model optimized for Hindi and English text processing. This 2 billion parameter model leverages an advanced architecture to provide state-of-the-art performance in natural language understanding and generation tasks.

Model Architecture

Core Specifications

  • Base Architecture: AksharaForCausalLM
  • Model Type: Causal Language Model (akshara)
  • Hidden Size: 2048
  • Number of Layers: 18
  • Attention Heads: 8
  • Key-Value Heads: 1
  • Intermediate Size: 16384
  • Head Dimension: 256
  • Vocabulary Size: 256,000 tokens
  • Maximum Sequence Length: 8,192 tokens
  • Parameters: ~2 billion

Technical Details

  • Attention Mechanism:
    • Attention Bias: Disabled
    • Attention Dropout: 0.0
    • RMS Norm Epsilon: 1e-06
    • Initializer Range: 0.02
    • RoPE Theta: 10000.0
    • RoPE Scaling: None

Special Tokens

  • BOS Token ID: 2
  • EOS Token ID: 1
  • PAD Token ID: 0

Implementation Details

  • Activation Function: GELU
  • Model Dtype: float16
  • Cache Usage: Enabled
  • Transformers Version: 4.38.1

Usage

Installation

pip install transformers

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SVECTOR-CORPORATION/Akshara-2B-Hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Basic Text Generation

text = "आज का मौसम"  # Example Hindi text: "Today's weather"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_length=100,
    temperature=0.7,
    do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Features

  • Bilingual Capabilities: Optimized for both Hindi and English text processing
  • Long Context Window: Supports sequences up to 8,192 tokens
  • Efficient Architecture: Uses a single key-value head for attention, optimizing computational efficiency
  • Large Vocabulary: 256,000 token vocabulary supporting diverse Hindi and English text

Performance Considerations

  • Model weights are stored in float16 format for efficient memory usage
  • Attention computation is optimized with disabled bias terms
  • Supports caching for improved inference speed
  • Uses RMSNorm with epsilon of 1e-06 for stable training

Limitations

  • Maximum context length of 8,192 tokens
  • Primarily optimized for Hindi and English languages
  • float16 precision may affect numerical stability in some cases

Citation

If you use this model in your research, please cite:

@misc{akshara2b2024,
  title={Akshara-2B-Hindi: A Bilingual Language Model},
  author={SVECTOR CORPORATION},
  year={2025},
}

License

MIT License

Copyright (c) 2025 SVECTOR CORPORATION

Downloads last month
0
Safetensors
Model size
2.51B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including SVECTOR-CORPORATION/Akshara-2B-Hindi