TristanBehrens commited on
Commit
17b266c
1 Parent(s): c0aac03

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - NLP
6
+ license: mit
7
+ datasets:
8
+ - TristanBehrens/bach_garland_2024-100K
9
+ base_model: None
10
+ ---
11
+
12
+ # bach_garland_mamba - An xLSTM Model
13
+
14
+ ![Trained with Helibrunna](banner.jpg)
15
+
16
+ Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2).
17
+
18
+ ## Configuration
19
+
20
+ ```
21
+ training:
22
+ model_name: bach_garland_mamba
23
+ batch_size: 28
24
+ lr: 0.001
25
+ lr_warmup_steps: 1428
26
+ lr_decay_until_steps: 14285
27
+ lr_decay_factor: 0.001
28
+ weight_decay: 0.1
29
+ amp_precision: bfloat16
30
+ weight_precision: float32
31
+ enable_mixed_precision: true
32
+ num_epochs: 8
33
+ output_dir: output/bach_garland_mamba
34
+ save_every_step: 500
35
+ log_every_step: 10
36
+ wandb_project: bach_garland
37
+ torch_compile: false
38
+ model:
39
+ type: mamba
40
+ d_model: 64
41
+ n_layers: 4
42
+ context_length: 4096
43
+ vocab_size: 178
44
+ dataset:
45
+ hugging_face_id: TristanBehrens/bach_garland_2024-100K
46
+ tokenizer:
47
+ type: whitespace
48
+ fill_token: '[EOS]'
49
+
50
+ ```