--- library_name: transformers license: other license_name: eva-llama3.3 base_model: meta-llama/Llama-3.3-70B-Instruct tags: - generated_from_trainer model-index: - name: dev/shm/EVA-LLaMA-3.33-70B-v0.1 results: [] datasets: - anthracite-org/kalo-opus-instruct-22k-no-refusal - Nopm/Opus_WritingStruct - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned - Gryphe/Sonnet3.5-Charcard-Roleplay - Gryphe/ChatGPT-4o-Writing-Prompts - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned - nothingiisreal/Reddit-Dirty-And-WritingPrompts - allura-org/Celeste-1.x-data-mixture - cognitivecomputations/dolphin-2.9.3 ---

EVA LLaMA 3.33 70B v0.0

A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
This model was built with Llama by Meta.

Prompt format is Llama3.


Recommended sampler values:

Recommended SillyTavern preset (via Virt-io):


Training data:

Training time and hardware:


Model was created by Kearm, Auri and Cahvay.

Special thanks:

Licensing

Llama-3.3-70B-Instruct by Meta is licensed under Llama 3.3 Community License Agreement (further referred as L3.3 license) and is a subject to Acceptable Use Policy for Llama Materials.
This derivative is free for personal, research and commercial use on terms of L3.3 license with one extra clause:
- Infermatic Inc and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models for any purpose.

[Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: meta-llama/Llama-3.3-70B-Instruct plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: true strict: false chat_template: llama3 datasets: - path: datasets/Celeste_Filtered_utf8fix.jsonl type: sharegpt - path: datasets/deduped_not_samantha_norefusals.jsonl type: sharegpt - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl type: sharegpt - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl type: sharegpt - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl type: sharegpt - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl type: sharegpt - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl type: sharegpt - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl type: sharegpt dataset_prepared_path: last_run_prepared val_set_size: 0.001 output_dir: /dev/shm/EVA-LLaMA-3.33-70B-v0.1 sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true wandb_project: EVA-LLaMA-3.33-70B wandb_entity: wandb_watch: wandb_name: Unit-v0.1 wandb_log_model: unfrozen_parameters: - ^lm_head.weight$ - ^model.embed_tokens.weight$ # mlp.down_proj layers - model.layers.40.mlp.down_proj - model.layers.44.mlp.down_proj - model.layers.45.mlp.down_proj - model.layers.46.mlp.down_proj - model.layers.43.mlp.down_proj - model.layers.52.mlp.down_proj - model.layers.47.mlp.down_proj - model.layers.39.mlp.down_proj - model.layers.48.mlp.down_proj - model.layers.49.mlp.down_proj - model.layers.38.mlp.down_proj - model.layers.53.mlp.down_proj - model.layers.35.mlp.down_proj - model.layers.41.mlp.down_proj - model.layers.51.mlp.down_proj - model.layers.42.mlp.down_proj - model.layers.37.mlp.down_proj - model.layers.50.mlp.down_proj - model.layers.76.mlp.down_proj - model.layers.60.mlp.down_proj - model.layers.36.mlp.down_proj - model.layers.54.mlp.down_proj - model.layers.57.mlp.down_proj - model.layers.56.mlp.down_proj - model.layers.59.mlp.down_proj - model.layers.55.mlp.down_proj - model.layers.77.mlp.down_proj - model.layers.61.mlp.down_proj - model.layers.58.mlp.down_proj - model.layers.65.mlp.down_proj - model.layers.75.mlp.down_proj - model.layers.64.mlp.down_proj - model.layers.62.mlp.down_proj - model.layers.68.mlp.down_proj - model.layers.19.mlp.down_proj - model.layers.73.mlp.down_proj - model.layers.66.mlp.down_proj - model.layers.67.mlp.down_proj - model.layers.63.mlp.down_proj - model.layers.74.mlp.down_proj # mlp.gate_proj layers - model.layers.70.mlp.gate_proj - model.layers.71.mlp.gate_proj - model.layers.67.mlp.gate_proj - model.layers.58.mlp.gate_proj - model.layers.55.mlp.gate_proj - model.layers.57.mlp.gate_proj - model.layers.56.mlp.gate_proj - model.layers.66.mlp.gate_proj - model.layers.72.mlp.gate_proj - model.layers.52.mlp.gate_proj - model.layers.69.mlp.gate_proj - model.layers.54.mlp.gate_proj - model.layers.62.mlp.gate_proj - model.layers.60.mlp.gate_proj - model.layers.59.mlp.gate_proj - model.layers.74.mlp.gate_proj - model.layers.51.mlp.gate_proj - model.layers.68.mlp.gate_proj - model.layers.61.mlp.gate_proj - model.layers.53.mlp.gate_proj - model.layers.73.mlp.gate_proj - model.layers.63.mlp.gate_proj - model.layers.48.mlp.gate_proj - model.layers.49.mlp.gate_proj - model.layers.64.mlp.gate_proj - model.layers.50.mlp.gate_proj - model.layers.65.mlp.gate_proj - model.layers.47.mlp.gate_proj - model.layers.44.mlp.gate_proj - model.layers.45.mlp.gate_proj - model.layers.75.mlp.gate_proj - model.layers.46.mlp.gate_proj - model.layers.43.mlp.gate_proj - model.layers.77.mlp.gate_proj - model.layers.41.mlp.gate_proj - model.layers.40.mlp.gate_proj - model.layers.42.mlp.gate_proj - model.layers.32.mlp.gate_proj - model.layers.30.mlp.gate_proj - model.layers.39.mlp.gate_proj # mlp.up_proj layers - model.layers.70.mlp.up_proj - model.layers.67.mlp.up_proj - model.layers.66.mlp.up_proj - model.layers.69.mlp.up_proj - model.layers.62.mlp.up_proj - model.layers.63.mlp.up_proj - model.layers.65.mlp.up_proj - model.layers.68.mlp.up_proj - model.layers.71.mlp.up_proj - model.layers.64.mlp.up_proj - model.layers.61.mlp.up_proj - model.layers.58.mlp.up_proj - model.layers.59.mlp.up_proj - model.layers.57.mlp.up_proj - model.layers.55.mlp.up_proj - model.layers.72.mlp.up_proj - model.layers.54.mlp.up_proj - model.layers.56.mlp.up_proj - model.layers.60.mlp.up_proj - model.layers.73.mlp.up_proj - model.layers.50.mlp.up_proj - model.layers.51.mlp.up_proj - model.layers.53.mlp.up_proj - model.layers.52.mlp.up_proj - model.layers.74.mlp.up_proj - model.layers.49.mlp.up_proj - model.layers.30.mlp.up_proj - model.layers.47.mlp.up_proj - model.layers.46.mlp.up_proj - model.layers.34.mlp.up_proj - model.layers.48.mlp.up_proj - model.layers.38.mlp.up_proj - model.layers.45.mlp.up_proj - model.layers.43.mlp.up_proj - model.layers.29.mlp.up_proj - model.layers.42.mlp.up_proj - model.layers.75.mlp.up_proj - model.layers.35.mlp.up_proj - model.layers.44.mlp.up_proj - model.layers.31.mlp.up_proj # self_attn.k_proj layers - model.layers.72.self_attn.k_proj - model.layers.75.self_attn.k_proj - model.layers.71.self_attn.k_proj - model.layers.74.self_attn.k_proj - model.layers.44.self_attn.k_proj - model.layers.31.self_attn.k_proj - model.layers.33.self_attn.k_proj - model.layers.34.self_attn.k_proj - model.layers.76.self_attn.k_proj - model.layers.78.self_attn.k_proj - model.layers.77.self_attn.k_proj - model.layers.60.self_attn.k_proj - model.layers.56.self_attn.k_proj - model.layers.22.self_attn.k_proj - model.layers.2.self_attn.k_proj - model.layers.18.self_attn.k_proj - model.layers.17.self_attn.k_proj - model.layers.21.self_attn.k_proj - model.layers.19.self_attn.k_proj - model.layers.23.self_attn.k_proj - model.layers.52.self_attn.k_proj - model.layers.73.self_attn.k_proj - model.layers.35.self_attn.k_proj - model.layers.15.self_attn.k_proj - model.layers.27.self_attn.k_proj - model.layers.29.self_attn.k_proj - model.layers.36.self_attn.k_proj - model.layers.28.self_attn.k_proj - model.layers.20.self_attn.k_proj - model.layers.25.self_attn.k_proj - model.layers.37.self_attn.k_proj - model.layers.30.self_attn.k_proj - model.layers.41.self_attn.k_proj - model.layers.16.self_attn.k_proj - model.layers.32.self_attn.k_proj - model.layers.68.self_attn.k_proj - model.layers.26.self_attn.k_proj - model.layers.38.self_attn.k_proj - model.layers.39.self_attn.k_proj - model.layers.70.self_attn.k_proj # self_attn.o_proj layers - model.layers.50.self_attn.o_proj - model.layers.61.self_attn.o_proj - model.layers.46.self_attn.o_proj - model.layers.53.self_attn.o_proj - model.layers.54.self_attn.o_proj - model.layers.19.self_attn.o_proj - model.layers.42.self_attn.o_proj - model.layers.41.self_attn.o_proj - model.layers.49.self_attn.o_proj - model.layers.68.self_attn.o_proj - model.layers.18.self_attn.o_proj - model.layers.45.self_attn.o_proj - model.layers.11.self_attn.o_proj - model.layers.48.self_attn.o_proj - model.layers.51.self_attn.o_proj - model.layers.67.self_attn.o_proj - model.layers.64.self_attn.o_proj - model.layers.13.self_attn.o_proj - model.layers.14.self_attn.o_proj - model.layers.16.self_attn.o_proj - model.layers.17.self_attn.o_proj - model.layers.47.self_attn.o_proj - model.layers.0.self_attn.o_proj - model.layers.20.self_attn.o_proj - model.layers.63.self_attn.o_proj - model.layers.5.self_attn.o_proj - model.layers.15.self_attn.o_proj - model.layers.21.self_attn.o_proj - model.layers.52.self_attn.o_proj - model.layers.12.self_attn.o_proj - model.layers.10.self_attn.o_proj - model.layers.56.self_attn.o_proj - model.layers.62.self_attn.o_proj - model.layers.22.self_attn.o_proj - model.layers.6.self_attn.o_proj - model.layers.7.self_attn.o_proj - model.layers.43.self_attn.o_proj - model.layers.38.self_attn.o_proj - model.layers.9.self_attn.o_proj - model.layers.44.self_attn.o_proj # self_attn.q_proj layers - model.layers.2.self_attn.q_proj - model.layers.4.self_attn.q_proj - model.layers.46.self_attn.q_proj - model.layers.5.self_attn.q_proj - model.layers.7.self_attn.q_proj - model.layers.6.self_attn.q_proj - model.layers.9.self_attn.q_proj - model.layers.10.self_attn.q_proj - model.layers.1.self_attn.q_proj - model.layers.18.self_attn.q_proj - model.layers.62.self_attn.q_proj - model.layers.8.self_attn.q_proj - model.layers.15.self_attn.q_proj - model.layers.14.self_attn.q_proj - model.layers.31.self_attn.q_proj - model.layers.17.self_attn.q_proj - model.layers.16.self_attn.q_proj - model.layers.19.self_attn.q_proj - model.layers.12.self_attn.q_proj - model.layers.33.self_attn.q_proj - model.layers.35.self_attn.q_proj - model.layers.21.self_attn.q_proj - model.layers.13.self_attn.q_proj - model.layers.27.self_attn.q_proj - model.layers.56.self_attn.q_proj - model.layers.34.self_attn.q_proj - model.layers.11.self_attn.q_proj - model.layers.52.self_attn.q_proj - model.layers.28.self_attn.q_proj - model.layers.54.self_attn.q_proj - model.layers.30.self_attn.q_proj - model.layers.29.self_attn.q_proj - model.layers.20.self_attn.q_proj - model.layers.75.self_attn.q_proj - model.layers.37.self_attn.q_proj - model.layers.44.self_attn.q_proj - model.layers.23.self_attn.q_proj - model.layers.64.self_attn.q_proj - model.layers.60.self_attn.q_proj - model.layers.36.self_attn.q_proj # self_attn.v_proj layers - model.layers.11.self_attn.v_proj - model.layers.17.self_attn.v_proj - model.layers.37.self_attn.v_proj - model.layers.40.self_attn.v_proj - model.layers.41.self_attn.v_proj - model.layers.42.self_attn.v_proj - model.layers.43.self_attn.v_proj - model.layers.44.self_attn.v_proj - model.layers.45.self_attn.v_proj - model.layers.46.self_attn.v_proj - model.layers.48.self_attn.v_proj - model.layers.49.self_attn.v_proj - model.layers.50.self_attn.v_proj - model.layers.51.self_attn.v_proj - model.layers.53.self_attn.v_proj - model.layers.54.self_attn.v_proj - model.layers.55.self_attn.v_proj - model.layers.57.self_attn.v_proj - model.layers.58.self_attn.v_proj - model.layers.59.self_attn.v_proj - model.layers.60.self_attn.v_proj - model.layers.61.self_attn.v_proj - model.layers.62.self_attn.v_proj - model.layers.63.self_attn.v_proj - model.layers.64.self_attn.v_proj - model.layers.65.self_attn.v_proj - model.layers.66.self_attn.v_proj - model.layers.67.self_attn.v_proj - model.layers.69.self_attn.v_proj - model.layers.75.self_attn.v_proj - model.layers.18.self_attn.v_proj - model.layers.78.self_attn.v_proj - model.layers.68.self_attn.v_proj - model.layers.47.self_attn.v_proj - model.layers.38.self_attn.v_proj - model.layers.39.self_attn.v_proj - model.layers.71.self_attn.v_proj - model.layers.19.self_attn.v_proj - model.layers.36.self_attn.v_proj - model.layers.20.self_attn.v_proj gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 3 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00003 max_grad_norm: 2 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: "unsloth" gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 20 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 debug: deepspeed: deepspeed_configs/zero3_bf16.json weight_decay: 0.2 ```