shivi101/tulu3-control-ssm50-50
Model Description
This is a distilled model based on Tulu-3, featuring a hybrid architecture with hybrid configuration. The model has been trained using knowledge distillation from Llama-3.2-3B-Instruct and is optimized for long-context understanding tasks.
Architecture
- Base Model: Llama-3.2-3B-Instruct
- Architecture: Hybrid Mamba-Transformer
- Training Method: Knowledge Distillation
- Context Length: Up to 16K tokens
- Parameters: ~3B
Training Details
- Teacher Model: Llama-3.2-3B-Instruct
- Student Model: Hybrid SSM architecture
- Dataset: Tulu-3 SFT Mixture
- Sequence Length: 8K-16K tokens
- Training Framework: Axolotl + DeepSpeed ZeRO-2
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "shivi101/tulu3-control-ssm50-50"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
# Example usage
prompt = "Explain the concept of knowledge distillation in machine learning."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Evaluation
This model is designed for evaluation on the Loong benchmark, which tests long-context understanding across:
- Spotlight Locating
- Comparison tasks
- Clustering
- Chain of Reasoning
Model Files
config.json: Model configurationmamba_config.json: Mamba-specific configurationmodel.safetensors: Model weightstokenizer.json: Tokenizer configurationgeneration_config.json: Generation parameters
Citation
@misc{your_model_name_2024,
title={Distilled Hybrid Mamba-Transformer for Long-Context Understanding},
author={Your Name},
year={2024},
url={https://huggingface.co/shivi101/tulu3-control-ssm50-50}
}
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for shivi101/tulu3-control-ssm50-50
Base model
meta-llama/Llama-3.2-3B-Instruct