|
--- |
|
library_name: nanotron |
|
--- |
|
|
|
# βοΈ Nano-Mistral |
|
|
|
Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/) |
|
|
|
Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1 |
|
|
|
## π Quickstart |
|
|
|
```bash |
|
# Generate a config file |
|
python config_tiny_mistral.py |
|
|
|
# Run training |
|
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations |
|
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml |
|
``` |
|
|
|
## π Run generation with pretrained Mistral-7B-0.1 |
|
|
|
```bash |
|
export CUDA_DEVICE_MAX_CONNECTIONS=1 |
|
torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1 |
|
``` |
|
|
|
## π Use your custom model |
|
|
|
- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration |
|
- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture |
|
- Pass the previous to the `DistributedTrainer` class in `run_train.py`: |
|
```python |
|
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig) |
|
``` |
|
- Run training as usual |
|
|