Akshara-2B-Hindi Language Model

Overview

Akshara-2B-Hindi is a powerful language model optimized for Hindi and English text processing. This 2 billion parameter model leverages an advanced architecture to provide state-of-the-art performance in natural language understanding and generation tasks.

Model Architecture

Core Specifications

Base Architecture: AksharaForCausalLM
Model Type: Causal Language Model (akshara)
Hidden Size: 2048
Number of Layers: 18
Attention Heads: 8
Key-Value Heads: 1
Intermediate Size: 16384
Head Dimension: 256
Vocabulary Size: 256,000 tokens
Maximum Sequence Length: 8,192 tokens
Parameters: ~2 billion

Technical Details

Attention Mechanism:
- Attention Bias: Disabled
- Attention Dropout: 0.0
- RMS Norm Epsilon: 1e-06
- Initializer Range: 0.02
- RoPE Theta: 10000.0
- RoPE Scaling: None

Special Tokens

BOS Token ID: 2
EOS Token ID: 1
PAD Token ID: 0

Implementation Details

Activation Function: GELU
Model Dtype: float16
Cache Usage: Enabled
Transformers Version: 4.38.1

Usage

Installation

pip install transformers

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SVECTOR-CORPORATION/Akshara-2B-Hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Basic Text Generation

text = "आज का मौसम"  # Example Hindi text: "Today's weather"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_length=100,
    temperature=0.7,
    do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Features

Bilingual Capabilities: Optimized for both Hindi and English text processing
Long Context Window: Supports sequences up to 8,192 tokens
Efficient Architecture: Uses a single key-value head for attention, optimizing computational efficiency
Large Vocabulary: 256,000 token vocabulary supporting diverse Hindi and English text

Performance Considerations

Model weights are stored in float16 format for efficient memory usage
Attention computation is optimized with disabled bias terms
Supports caching for improved inference speed
Uses RMSNorm with epsilon of 1e-06 for stable training

Limitations

Maximum context length of 8,192 tokens
Primarily optimized for Hindi and English languages
float16 precision may affect numerical stability in some cases

Citation

If you use this model in your research, please cite:

@misc{akshara2b2024,
  title={Akshara-2B-Hindi: A Bilingual Language Model},
  author={SVECTOR CORPORATION},
  year={2025},
}

License

MIT License

SVECTOR-CORPORATION
/

Akshara-2B-Hindi