shivi101/tulu3-control-ssm50-50

Model Description

This is a distilled model based on Tulu-3, featuring a hybrid architecture with hybrid configuration. The model has been trained using knowledge distillation from Llama-3.2-3B-Instruct and is optimized for long-context understanding tasks.

Architecture

  • Base Model: Llama-3.2-3B-Instruct
  • Architecture: Hybrid Mamba-Transformer
  • Training Method: Knowledge Distillation
  • Context Length: Up to 16K tokens
  • Parameters: ~3B

Training Details

  • Teacher Model: Llama-3.2-3B-Instruct
  • Student Model: Hybrid SSM architecture
  • Dataset: Tulu-3 SFT Mixture
  • Sequence Length: 8K-16K tokens
  • Training Framework: Axolotl + DeepSpeed ZeRO-2

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "shivi101/tulu3-control-ssm50-50"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# Example usage
prompt = "Explain the concept of knowledge distillation in machine learning."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Evaluation

This model is designed for evaluation on the Loong benchmark, which tests long-context understanding across:

  • Spotlight Locating
  • Comparison tasks
  • Clustering
  • Chain of Reasoning

Model Files

  • config.json: Model configuration
  • mamba_config.json: Mamba-specific configuration
  • model.safetensors: Model weights
  • tokenizer.json: Tokenizer configuration
  • generation_config.json: Generation parameters

Citation

@misc{your_model_name_2024,
  title={Distilled Hybrid Mamba-Transformer for Long-Context Understanding},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/shivi101/tulu3-control-ssm50-50}
}
Downloads last month
5
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shivi101/tulu3-control-ssm50-50

Finetuned
(665)
this model