shivi101/tulu3-control-ssm50-50

Model Description

This is a distilled model based on Tulu-3, featuring a hybrid architecture with hybrid configuration. The model has been trained using knowledge distillation from Llama-3.2-3B-Instruct and is optimized for long-context understanding tasks.

Architecture

Base Model: Llama-3.2-3B-Instruct
Architecture: Hybrid Mamba-Transformer
Training Method: Knowledge Distillation
Context Length: Up to 16K tokens
Parameters: ~3B

Training Details

Teacher Model: Llama-3.2-3B-Instruct
Student Model: Hybrid SSM architecture
Dataset: Tulu-3 SFT Mixture
Sequence Length: 8K-16K tokens
Training Framework: Axolotl + DeepSpeed ZeRO-2

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "shivi101/tulu3-control-ssm50-50"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# Example usage
prompt = "Explain the concept of knowledge distillation in machine learning."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Evaluation

This model is designed for evaluation on the Loong benchmark, which tests long-context understanding across:

Spotlight Locating
Comparison tasks
Clustering
Chain of Reasoning

Model Files

config.json: Model configuration
mamba_config.json: Mamba-specific configuration
model.safetensors: Model weights
tokenizer.json: Tokenizer configuration
generation_config.json: Generation parameters

Citation

@misc{your_model_name_2024,
  title={Distilled Hybrid Mamba-Transformer for Long-Context Understanding},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/shivi101/tulu3-control-ssm50-50}
}

Downloads last month: 5

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shivi101/tulu3-control-ssm50-50

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(665)

this model