macadeliccc commited on
Commit
8a2c9e9
1 Parent(s): 892783d

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ magistrate-3.2-3b-it.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ magistrate-3.2-3b-it.bf16.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,764 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: macadeliccc/magistrate-3.2-3b-base
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ - NousResearch/hermes-function-calling-v1
6
+ - arcee-ai/The-Tome
7
+ - cognitivecomputations/SystemChat-2.0
8
+ language:
9
+ - en
10
+ library_name: transformers
11
+ license: llama3.2
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - spectrum
15
+ - llama-3
16
+ - axolotl
17
+ - legal
18
+ - HFforLegal
19
+ - autoquant
20
+ - gguf
21
+ ---
22
+ # magistrate-3.2-3b-it
23
+
24
+ This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
25
+ It achieves the following results on the evaluation set:
26
+ - Loss: 0.8067
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.4.1`
35
+ ```yaml
36
+ base_model: macadeliccc/magistrate-3.2-3b-base
37
+ model_type: LlamaForCausalLM
38
+ tokenizer_type: AutoTokenizer
39
+
40
+ load_in_8bit: false
41
+ load_in_4bit: false
42
+ strict: false
43
+
44
+ datasets:
45
+ - path: json
46
+ type: sharegpt
47
+ conversation: chatml
48
+ data_files: train/hermes-2.5.jsonl
49
+ # - path: json
50
+ # type: sharegpt
51
+ # conversation: chatml
52
+ # data_files: train/financial_instructions_cleaned_2.json
53
+ - path: json
54
+ type: sharegpt
55
+ conversation: chatml
56
+ data_files: train/glaive-function-calling-5k.json
57
+ - path: json
58
+ type: sharegpt
59
+ conversation: chatml
60
+ data_files: train/func-calling-singleturn.json
61
+ - path: json
62
+ type: sharegpt
63
+ conversation: chatml
64
+ data_files: train/func-calling.json
65
+ - path: json
66
+ type: sharegpt
67
+ conversation: chatml
68
+ data_files: train/json-mode-agentic.json
69
+ - path: json
70
+ type: sharegpt
71
+ conversation: chatml
72
+ data_files: train/json-mode-singleturn.json
73
+ - path: json
74
+ type: sharegpt
75
+ conversation: chatml
76
+ data_files: train/reasoning_sharegpt.json
77
+ - path: json
78
+ type: sharegpt
79
+ conversation: chatml
80
+ data_files: train/systemchat_2_0_small.json
81
+ - path: json
82
+ type: sharegpt
83
+ conversation: chatml
84
+ data_files: train/argument_dataset/303_creative_llc_v__elenis_sharegpt.json
85
+ - path: json
86
+ type: sharegpt
87
+ conversation: chatml
88
+ data_files: train/argument_dataset/abitron_austria_gmbh_v__hetronic_international__inc__sharegpt.json
89
+ - path: json
90
+ type: sharegpt
91
+ conversation: chatml
92
+ data_files: train/argument_dataset/acheson_hotels__llc_v__laufer_sharegpt.json
93
+ - path: json
94
+ type: sharegpt
95
+ conversation: chatml
96
+ data_files: train/argument_dataset/alexander_v__sc_conference_of_naacp_sharegpt.json
97
+ - path: json
98
+ type: sharegpt
99
+ conversation: chatml
100
+ data_files: train/argument_dataset/amgen_inc__v__sanofi_sharegpt.json
101
+ - path: json
102
+ type: sharegpt
103
+ conversation: chatml
104
+ data_files: train/argument_dataset/andy_warhol_found___inc__v__goldsmith_sharegpt.json
105
+ - path: json
106
+ type: sharegpt
107
+ conversation: chatml
108
+ data_files: train/argument_dataset/arizona_v__navajo_nation_sharegpt.json
109
+ - path: json
110
+ type: sharegpt
111
+ conversation: chatml
112
+ data_files: train/argument_dataset/becerra__sec__of_h_hs_v__san_carlos_apache_tribe_sharegpt.json
113
+ - path: json
114
+ type: sharegpt
115
+ conversation: chatml
116
+ data_files: train/argument_dataset/biden_v__nebraska_sharegpt.json
117
+ - path: json
118
+ type: sharegpt
119
+ conversation: chatml
120
+ data_files: train/argument_dataset/bissonnette_v__lepage_bakeries_park_st___llc_sharegpt.json
121
+ - path: json
122
+ type: sharegpt
123
+ conversation: chatml
124
+ data_files: train/argument_dataset/bittner_v__united_states_sharegpt.json
125
+ - path: json
126
+ type: sharegpt
127
+ conversation: chatml
128
+ data_files: train/argument_dataset/brown_v__united_states_sharegpt.json
129
+ - path: json
130
+ type: sharegpt
131
+ conversation: chatml
132
+ data_files: train/argument_dataset/cantero_v__bank_of_america__n_a__sharegpt.json
133
+ - path: json
134
+ type: sharegpt
135
+ conversation: chatml
136
+ data_files: train/argument_dataset/cfpb_v__com__fin__services_assn__sharegpt.json
137
+ - path: json
138
+ type: sharegpt
139
+ conversation: chatml
140
+ data_files: train/argument_dataset/chiaverini_v__city_of_napoleon_sharegpt.json
141
+ - path: json
142
+ type: sharegpt
143
+ conversation: chatml
144
+ data_files: train/argument_dataset/ciminelli_v__united_state_sharegpt.json
145
+ - path: json
146
+ type: sharegpt
147
+ conversation: chatml
148
+ data_files: train/argument_dataset/city_of_grants_pass_v__johnson_sharegpt.json
149
+ - path: json
150
+ type: sharegpt
151
+ conversation: chatml
152
+ data_files: train/argument_dataset/coinbase__inc__v__bielski_sharegpt.json
153
+ - path: json
154
+ type: sharegpt
155
+ conversation: chatml
156
+ data_files: train/argument_dataset/coinbase__inc__v__suski_sharegpt.json
157
+ - path: json
158
+ type: sharegpt
159
+ conversation: chatml
160
+ data_files: train/argument_dataset/connelly_v__united_states_sharegpt.json
161
+ - path: json
162
+ type: sharegpt
163
+ conversation: chatml
164
+ data_files: train/argument_dataset/corner_post__inc__v__bd__of_governors__frs_sharegpt.json
165
+ - path: json
166
+ type: sharegpt
167
+ conversation: chatml
168
+ data_files: train/argument_dataset/counterman_v__colorado_sharegpt.json
169
+ - path: json
170
+ type: sharegpt
171
+ conversation: chatml
172
+ data_files: train/argument_dataset/cruz_v__arizona_sharegpt.json
173
+ - path: json
174
+ type: sharegpt
175
+ conversation: chatml
176
+ data_files: train/argument_dataset/culley_v__marshall_sharegpt.json
177
+ - path: json
178
+ type: sharegpt
179
+ conversation: chatml
180
+ data_files: train/argument_dataset/dept__of_agric__rural_dev__v__kirtz_sharegpt.json
181
+ - path: json
182
+ type: sharegpt
183
+ conversation: chatml
184
+ data_files: train/argument_dataset/dept__of_education_v__brown_sharegpt.json
185
+ - path: json
186
+ type: sharegpt
187
+ conversation: chatml
188
+ data_files: train/argument_dataset/dept__of_state_v__munoz_sharegpt.json
189
+ - path: json
190
+ type: sharegpt
191
+ conversation: chatml
192
+ data_files: train/argument_dataset/devillier_v__texas_sharegpt.json
193
+ - path: json
194
+ type: sharegpt
195
+ conversation: chatml
196
+ data_files: train/argument_dataset/diaz_v__united_states_sharegpt.json
197
+ - path: json
198
+ type: sharegpt
199
+ conversation: chatml
200
+ data_files: train/argument_dataset/dubin_v__united_states_sharegpt.json
201
+ - path: json
202
+ type: sharegpt
203
+ conversation: chatml
204
+ data_files: train/argument_dataset/dupree_v__younger_sharegpt.json
205
+ - path: json
206
+ type: sharegpt
207
+ conversation: chatml
208
+ data_files: train/argument_dataset/erlinger_v__united_states_sharegpt.json
209
+ - path: json
210
+ type: sharegpt
211
+ conversation: chatml
212
+ data_files: train/argument_dataset/fbi_v__fikre_sharegpt.json
213
+ - path: json
214
+ type: sharegpt
215
+ conversation: chatml
216
+ data_files: train/argument_dataset/fda_v__alliance_hippocratic_medicine_sharegpt.json
217
+ - path: json
218
+ type: sharegpt
219
+ conversation: chatml
220
+ data_files: train/argument_dataset/financial_oversight_board_v__cpi_sharegpt.json
221
+ - path: json
222
+ type: sharegpt
223
+ conversation: chatml
224
+ data_files: train/argument_dataset/fischer_v__united_states_sharegpt.json
225
+ - path: json
226
+ type: sharegpt
227
+ conversation: chatml
228
+ data_files: train/argument_dataset/garland__att_y_gen__v__cargill_sharegpt.json
229
+ - path: json
230
+ type: sharegpt
231
+ conversation: chatml
232
+ data_files: train/argument_dataset/glacier_northwest__inc__v__int_l_brotherhood_of_teamsters_sharegpt.json
233
+ - path: json
234
+ type: sharegpt
235
+ conversation: chatml
236
+ data_files: train/argument_dataset/gonzalez_v__google_llc_sharegpt.json
237
+ - path: json
238
+ type: sharegpt
239
+ conversation: chatml
240
+ data_files: train/argument_dataset/gonzalez_v__trevino_sharegpt.json
241
+ - path: json
242
+ type: sharegpt
243
+ conversation: chatml
244
+ data_files: train/argument_dataset/great_lakes_insurance_se_v__raiders_retreat_realty_co___llc_sharegpt.json
245
+ - path: json
246
+ type: sharegpt
247
+ conversation: chatml
248
+ data_files: train/argument_dataset/groff_v__dejoy_sharegpt.json
249
+ - path: json
250
+ type: sharegpt
251
+ conversation: chatml
252
+ data_files: train/argument_dataset/harrington_v__purdue_pharma_l_p__sharegpt.json
253
+ - path: json
254
+ type: sharegpt
255
+ conversation: chatml
256
+ data_files: train/argument_dataset/harrow_v__dept__of_defense_sharegpt.json
257
+ - path: json
258
+ type: sharegpt
259
+ conversation: chatml
260
+ data_files: train/argument_dataset/health_and_hospital_corp__v__talevski_sharegpt.json
261
+ - path: json
262
+ type: sharegpt
263
+ conversation: chatml
264
+ data_files: train/argument_dataset/helix_energy_solutions_v__hewitt_sharegpt.json
265
+ - path: json
266
+ type: sharegpt
267
+ conversation: chatml
268
+ data_files: train/argument_dataset/in_re_grand_jury_sharegpt.json
269
+ - path: json
270
+ type: sharegpt
271
+ conversation: chatml
272
+ data_files: train/argument_dataset/jack_daniel_s_properties__inc__v__vip_products_sharegpt.json
273
+ - path: json
274
+ type: sharegpt
275
+ conversation: chatml
276
+ data_files: train/argument_dataset/jones_v__hendrix_sharegpt.json
277
+ - path: json
278
+ type: sharegpt
279
+ conversation: chatml
280
+ data_files: train/argument_dataset/karcho_polselli_v__irs_sharegpt.json
281
+ - path: json
282
+ type: sharegpt
283
+ conversation: chatml
284
+ data_files: train/argument_dataset/lac_du_flambeau_band_v__coughlin_sharegpt.json
285
+ - path: json
286
+ type: sharegpt
287
+ conversation: chatml
288
+ data_files: train/argument_dataset/lindke_v__freed_sharegpt.json
289
+ - path: json
290
+ type: sharegpt
291
+ conversation: chatml
292
+ data_files: train/argument_dataset/loper_bright_enterprises__inc__v__raimondo__sec__of_comm__sharegpt.json
293
+ - path: json
294
+ type: sharegpt
295
+ conversation: chatml
296
+ data_files: train/argument_dataset/lora_v__united_states_sharegpt.json
297
+ - path: json
298
+ type: sharegpt
299
+ conversation: chatml
300
+ data_files: train/argument_dataset/macquarie_infrastructure_corp__v__moab_partners__l_p__sharegpt.json
301
+ - path: json
302
+ type: sharegpt
303
+ conversation: chatml
304
+ data_files: train/argument_dataset/mallory_v__norfolk_southern_railway_co__sharegpt.json
305
+ - path: json
306
+ type: sharegpt
307
+ conversation: chatml
308
+ data_files: train/argument_dataset/mcintosh_v__united_states_sharegpt.json
309
+ - path: json
310
+ type: sharegpt
311
+ conversation: chatml
312
+ data_files: train/argument_dataset/merrill_v__milligan_sharegpt.json
313
+ - path: json
314
+ type: sharegpt
315
+ conversation: chatml
316
+ data_files: train/argument_dataset/moore_v__harper_sharegpt.json
317
+ - path: json
318
+ type: sharegpt
319
+ conversation: chatml
320
+ data_files: train/argument_dataset/moore_v__united_states_sharegpt.json
321
+ - path: json
322
+ type: sharegpt
323
+ conversation: chatml
324
+ data_files: train/argument_dataset/moyle_v__united_states_sharegpt.json
325
+ - path: json
326
+ type: sharegpt
327
+ conversation: chatml
328
+ data_files: train/argument_dataset/muldrow_v__st__louis_sharegpt.json
329
+ - path: json
330
+ type: sharegpt
331
+ conversation: chatml
332
+ data_files: train/argument_dataset/murray_v__ubs_securities__llc_sharegpt.json
333
+ - path: json
334
+ type: sharegpt
335
+ conversation: chatml
336
+ data_files: train/argument_dataset/murthy__surgeon_gen__v__missouri_sharegpt.json
337
+ - path: json
338
+ type: sharegpt
339
+ conversation: chatml
340
+ data_files: train/argument_dataset/netchoice__llc_v__paxton_sharegpt.json
341
+ - path: json
342
+ type: sharegpt
343
+ conversation: chatml
344
+ data_files: train/argument_dataset/new_york_v__new_jersey_sharegpt.json
345
+ - path: json
346
+ type: sharegpt
347
+ conversation: chatml
348
+ data_files: train/argument_dataset/nra_v__vullo_sharegpt.json
349
+ - path: json
350
+ type: sharegpt
351
+ conversation: chatml
352
+ data_files: train/argument_dataset/o_connor_ratcliff_v__garnier_sharegpt.json
353
+ - path: json
354
+ type: sharegpt
355
+ conversation: chatml
356
+ data_files: train/argument_dataset/oh_adjutant_gen__s_dept__v__flra_sharegpt.json
357
+ - path: json
358
+ type: sharegpt
359
+ conversation: chatml
360
+ data_files: train/argument_dataset/ohio_v__epa_sharegpt.json
361
+ - path: json
362
+ type: sharegpt
363
+ conversation: chatml
364
+ data_files: train/argument_dataset/perez_v__sturgis_public_schools_sharegpt.json
365
+ - path: json
366
+ type: sharegpt
367
+ conversation: chatml
368
+ data_files: train/argument_dataset/pugin_v__garland_sharegpt.json
369
+ - path: json
370
+ type: sharegpt
371
+ conversation: chatml
372
+ data_files: train/argument_dataset/pulsifer_v__united_states_sharegpt.json
373
+ - path: json
374
+ type: sharegpt
375
+ conversation: chatml
376
+ data_files: train/argument_dataset/relentless__inc__v__dept__of_commerce_sharegpt.json
377
+ - path: json
378
+ type: sharegpt
379
+ conversation: chatml
380
+ data_files: train/argument_dataset/rudisill_v__mcdonough__sec__of_va_sharegpt.json
381
+ - path: json
382
+ type: sharegpt
383
+ conversation: chatml
384
+ data_files: train/argument_dataset/sackett_v__epa_sharegpt.json
385
+ - path: json
386
+ type: sharegpt
387
+ conversation: chatml
388
+ data_files: train/argument_dataset/samia_v__united_states_sharegpt.json
389
+ - path: json
390
+ type: sharegpt
391
+ conversation: chatml
392
+ data_files: train/argument_dataset/santos_zacaria_v__garland__att_y_gen__sharegpt.json
393
+ - path: json
394
+ type: sharegpt
395
+ conversation: chatml
396
+ data_files: train/argument_dataset/sec_v__cochran_sharegpt.json
397
+ - path: json
398
+ type: sharegpt
399
+ conversation: chatml
400
+ data_files: train/argument_dataset/sec_v__jarkesy_sharegpt.json
401
+ - path: json
402
+ type: sharegpt
403
+ conversation: chatml
404
+ data_files: train/argument_dataset/sheetz_v__county_of_el_dorado_sharegpt.json
405
+ - path: json
406
+ type: sharegpt
407
+ conversation: chatml
408
+ data_files: train/argument_dataset/slack_technologies__llc_v__pirani_sharegpt.json
409
+ - path: json
410
+ type: sharegpt
411
+ conversation: chatml
412
+ data_files: train/argument_dataset/smith_v__arizona_sharegpt.json
413
+ - path: json
414
+ type: sharegpt
415
+ conversation: chatml
416
+ data_files: train/argument_dataset/smith_v__spizzirri_sharegpt.json
417
+ - path: json
418
+ type: sharegpt
419
+ conversation: chatml
420
+ data_files: train/argument_dataset/smith_v__united_states_sharegpt.json
421
+ - path: json
422
+ type: sharegpt
423
+ conversation: chatml
424
+ data_files: train/argument_dataset/snyder_v__united_states_sharegpt.json
425
+ - path: json
426
+ type: sharegpt
427
+ conversation: chatml
428
+ data_files: train/argument_dataset/starbucks_corp__v__mckinney_sharegpt.json
429
+ - path: json
430
+ type: sharegpt
431
+ conversation: chatml
432
+ data_files: train/argument_dataset/students_for_fair_admissions_v__university_of_nc_sharegpt.json
433
+ - path: json
434
+ type: sharegpt
435
+ conversation: chatml
436
+ data_files: train/argument_dataset/texas_v__new_mexico_and_colorado_sharegpt.json
437
+ - path: json
438
+ type: sharegpt
439
+ conversation: chatml
440
+ data_files: train/argument_dataset/thornell_v__jones_sharegpt.json
441
+ - path: json
442
+ type: sharegpt
443
+ conversation: chatml
444
+ data_files: train/argument_dataset/truck_insurance_exchange_v__kaiser_gypsum_co__inc__sharegpt.json
445
+ - path: json
446
+ type: sharegpt
447
+ conversation: chatml
448
+ data_files: train/argument_dataset/trump_v__anderson_sharegpt.json
449
+ - path: json
450
+ type: sharegpt
451
+ conversation: chatml
452
+ data_files: train/argument_dataset/turkiye_halk_bankasi_a_s__v__united_states_sharegpt.json
453
+ - path: json
454
+ type: sharegpt
455
+ conversation: chatml
456
+ data_files: train/argument_dataset/twitter__inc__v__taamneh_sharegpt.json
457
+ - path: json
458
+ type: sharegpt
459
+ conversation: chatml
460
+ data_files: train/argument_dataset/tyler_v__hennepin_county_sharegpt.json
461
+ - path: json
462
+ type: sharegpt
463
+ conversation: chatml
464
+ data_files: train/argument_dataset/u_s___ex_rel__polansky_v__executive_health_sharegpt.json
465
+ - path: json
466
+ type: sharegpt
467
+ conversation: chatml
468
+ data_files: train/argument_dataset/u_s___ex_rel__schutte_v__supervalu_inc__sharegpt.json
469
+ - path: json
470
+ type: sharegpt
471
+ conversation: chatml
472
+ data_files: train/argument_dataset/united_states_trustee_v__john_q__hammons_fall_2006__llc_sharegpt.json
473
+ - path: json
474
+ type: sharegpt
475
+ conversation: chatml
476
+ data_files: train/argument_dataset/united_states_v__hansen_sharegpt.json
477
+ - path: json
478
+ type: sharegpt
479
+ conversation: chatml
480
+ data_files: train/argument_dataset/united_states_v__rahimi_sharegpt.json
481
+ - path: json
482
+ type: sharegpt
483
+ conversation: chatml
484
+ data_files: train/argument_dataset/united_states_v__texas_sharegpt.json
485
+ - path: json
486
+ type: sharegpt
487
+ conversation: chatml
488
+ data_files: train/argument_dataset/vidal__under_sec__of_comm__v__elster_sharegpt.json
489
+ - path: json
490
+ type: sharegpt
491
+ conversation: chatml
492
+ data_files: train/argument_dataset/warner_chappell_music__inc__v__nealy_sharegpt.json
493
+ - path: json
494
+ type: sharegpt
495
+ conversation: chatml
496
+ data_files: train/argument_dataset/wilkins_v__united_states_sharegpt.json
497
+ - path: json
498
+ type: sharegpt
499
+ conversation: chatml
500
+ data_files: train/argument_dataset/wilkinson_v__garland__att_y_gen__sharegpt.json
501
+ - path: json
502
+ type: sharegpt
503
+ conversation: chatml
504
+ data_files: train/argument_dataset/yegiazaryan_v__smagin_sharegpt.json
505
+
506
+ chat_template: chatml
507
+
508
+ unfrozen_parameters:
509
+ - ^lm_head.weight$
510
+ - ^model.embed_tokens.weight$
511
+ # input_layernorm layers
512
+ - model.layers.0.input_layernorm
513
+ - model.layers.1.input_layernorm
514
+ - model.layers.2.input_layernorm
515
+ - model.layers.3.input_layernorm
516
+ - model.layers.4.input_layernorm
517
+ - model.layers.5.input_layernorm
518
+ - model.layers.6.input_layernorm
519
+ - model.layers.7.input_layernorm
520
+ - model.layers.8.input_layernorm
521
+ - model.layers.9.input_layernorm
522
+ - model.layers.10.input_layernorm
523
+ - model.layers.11.input_layernorm
524
+ - model.layers.12.input_layernorm
525
+ - model.layers.13.input_layernorm
526
+ # mlp.down_proj layers
527
+ - model.layers.0.mlp.down_proj
528
+ - model.layers.1.mlp.down_proj
529
+ - model.layers.17.mlp.down_proj
530
+ - model.layers.19.mlp.down_proj
531
+ - model.layers.18.mlp.down_proj
532
+ - model.layers.5.mlp.down_proj
533
+ - model.layers.20.mlp.down_proj
534
+ - model.layers.2.mlp.down_proj
535
+ - model.layers.4.mlp.down_proj
536
+ - model.layers.6.mlp.down_proj
537
+ - model.layers.3.mlp.down_proj
538
+ - model.layers.16.mlp.down_proj
539
+ - model.layers.15.mlp.down_proj
540
+ - model.layers.13.mlp.down_proj
541
+ # mlp.gate_proj layers
542
+ - model.layers.0.mlp.gate_proj
543
+ - model.layers.1.mlp.gate_proj
544
+ - model.layers.2.mlp.gate_proj
545
+ - model.layers.3.mlp.gate_proj
546
+ - model.layers.22.mlp.gate_proj
547
+ - model.layers.21.mlp.gate_proj
548
+ - model.layers.20.mlp.gate_proj
549
+ - model.layers.23.mlp.gate_proj
550
+ - model.layers.19.mlp.gate_proj
551
+ - model.layers.4.mlp.gate_proj
552
+ - model.layers.18.mlp.gate_proj
553
+ - model.layers.17.mlp.gate_proj
554
+ - model.layers.5.mlp.gate_proj
555
+ - model.layers.24.mlp.gate_proj
556
+ # mlp.up_proj layers
557
+ - model.layers.4.mlp.up_proj
558
+ - model.layers.3.mlp.up_proj
559
+ - model.layers.5.mlp.up_proj
560
+ - model.layers.6.mlp.up_proj
561
+ - model.layers.7.mlp.up_proj
562
+ - model.layers.2.mlp.up_proj
563
+ - model.layers.8.mlp.up_proj
564
+ - model.layers.14.mlp.up_proj
565
+ - model.layers.13.mlp.up_proj
566
+ - model.layers.11.mlp.up_proj
567
+ - model.layers.9.mlp.up_proj
568
+ - model.layers.1.mlp.up_proj
569
+ - model.layers.15.mlp.up_proj
570
+ - model.layers.12.mlp.up_proj
571
+ # post_attention_layernorm layers
572
+ - model.layers.0.post_attention_layernorm
573
+ - model.layers.1.post_attention_layernorm
574
+ - model.layers.2.post_attention_layernorm
575
+ - model.layers.3.post_attention_layernorm
576
+ - model.layers.4.post_attention_layernorm
577
+ - model.layers.5.post_attention_layernorm
578
+ - model.layers.6.post_attention_layernorm
579
+ - model.layers.7.post_attention_layernorm
580
+ - model.layers.8.post_attention_layernorm
581
+ - model.layers.9.post_attention_layernorm
582
+ - model.layers.10.post_attention_layernorm
583
+ - model.layers.11.post_attention_layernorm
584
+ - model.layers.12.post_attention_layernorm
585
+ - model.layers.13.post_attention_layernorm
586
+ # self_attn.k_proj layers
587
+ - model.layers.25.self_attn.k_proj
588
+ - model.layers.22.self_attn.k_proj
589
+ - model.layers.19.self_attn.k_proj
590
+ - model.layers.20.self_attn.k_proj
591
+ - model.layers.17.self_attn.k_proj
592
+ - model.layers.24.self_attn.k_proj
593
+ - model.layers.23.self_attn.k_proj
594
+ - model.layers.18.self_attn.k_proj
595
+ - model.layers.21.self_attn.k_proj
596
+ - model.layers.27.self_attn.k_proj
597
+ - model.layers.15.self_attn.k_proj
598
+ - model.layers.10.self_attn.k_proj
599
+ - model.layers.6.self_attn.k_proj
600
+ - model.layers.5.self_attn.k_proj
601
+ # self_attn.o_proj layers
602
+ - model.layers.13.self_attn.o_proj
603
+ - model.layers.7.self_attn.o_proj
604
+ - model.layers.12.self_attn.o_proj
605
+ - model.layers.10.self_attn.o_proj
606
+ - model.layers.5.self_attn.o_proj
607
+ - model.layers.21.self_attn.o_proj
608
+ - model.layers.6.self_attn.o_proj
609
+ - model.layers.19.self_attn.o_proj
610
+ - model.layers.8.self_attn.o_proj
611
+ - model.layers.20.self_attn.o_proj
612
+ - model.layers.22.self_attn.o_proj
613
+ - model.layers.9.self_attn.o_proj
614
+ - model.layers.17.self_attn.o_proj
615
+ - model.layers.11.self_attn.o_proj
616
+ # self_attn.q_proj layers
617
+ - model.layers.12.self_attn.q_proj
618
+ - model.layers.13.self_attn.q_proj
619
+ - model.layers.9.self_attn.q_proj
620
+ - model.layers.8.self_attn.q_proj
621
+ - model.layers.10.self_attn.q_proj
622
+ - model.layers.14.self_attn.q_proj
623
+ - model.layers.11.self_attn.q_proj
624
+ - model.layers.15.self_attn.q_proj
625
+ - model.layers.26.self_attn.q_proj
626
+ - model.layers.6.self_attn.q_proj
627
+ - model.layers.7.self_attn.q_proj
628
+ - model.layers.16.self_attn.q_proj
629
+ - model.layers.5.self_attn.q_proj
630
+ - model.layers.25.self_attn.q_proj
631
+ # model.norm layers
632
+ # self_attn.v_proj layers
633
+ - model.layers.23.self_attn.v_proj
634
+ - model.layers.14.self_attn.v_proj
635
+ - model.layers.15.self_attn.v_proj
636
+ - model.layers.19.self_attn.v_proj
637
+ - model.layers.3.self_attn.v_proj
638
+ - model.layers.18.self_attn.v_proj
639
+ - model.layers.25.self_attn.v_proj
640
+ - model.layers.4.self_attn.v_proj
641
+ - model.layers.17.self_attn.v_proj
642
+ - model.layers.22.self_attn.v_proj
643
+ - model.layers.20.self_attn.v_proj
644
+ - model.layers.13.self_attn.v_proj
645
+ - model.layers.6.self_attn.v_proj
646
+ - model.layers.27.self_attn.v_proj
647
+
648
+ val_set_size: 0.05
649
+ output_dir: ./outputs/magistrate-3.2-3b
650
+
651
+ sequence_len: 8192
652
+ sample_packing: true
653
+ eval_sample_packing: false
654
+ pad_to_sequence_len: true
655
+
656
+ adapter:
657
+
658
+ wandb_project:
659
+ wandb_entity:
660
+ wandb_watch:
661
+ wandb_name:
662
+ wandb_log_model:
663
+
664
+ gradient_accumulation_steps: 8
665
+ micro_batch_size: 1
666
+ num_epochs: 3
667
+ optimizer: paged_adamw_32bit
668
+ lr_scheduler: cosine
669
+ learning_rate: 2e-4
670
+
671
+ train_on_inputs: false
672
+ group_by_length: false
673
+ bf16: auto
674
+ fp16:
675
+ tf32: false
676
+
677
+ gradient_checkpointing: true
678
+ early_stopping_patience:
679
+ resume_from_checkpoint:
680
+ local_rank:
681
+ logging_steps: 1
682
+ xformers_attention:
683
+ flash_attention: true
684
+ s2_attention:
685
+
686
+ warmup_steps: 1000
687
+ evals_per_epoch: 2
688
+ eval_table_size:
689
+ eval_max_new_tokens: 128
690
+ saves_per_epoch: 1
691
+ debug:
692
+ deepspeed: deepspeed_configs/zero3.json
693
+ weight_decay: 0.0
694
+ fsdp:
695
+ fsdp_config:
696
+ special_tokens:
697
+ eos_token: "<|im_end|>"
698
+ pad_token: "<|end_of_text|>"
699
+ tokens:
700
+ - "<|im_start|>"
701
+ - "<|im_end|>"
702
+ ```
703
+
704
+ </details><br>
705
+
706
+
707
+ ## Model description
708
+
709
+ Magistrate-3.2-3b-it is a legal assistant specializing in US Supreme Court case law and US Federal regulations.
710
+
711
+ The base model is pretrained with ~250M tokens containing no synthetic legal data. The instruct model does contain synthetic data.
712
+
713
+ ## Intended uses & limitations
714
+
715
+ This model is for research purposes and for continued development of the legal specialty. You are liable for all model outputs.
716
+
717
+ ## Training and evaluation data
718
+
719
+ This model was trained on a variety of standard open source datasets like OpenHermes-2.5, hermes-function-calling, and some select entries from the Tome.
720
+ Additionally, I have included a comprehensive, non-synthetic argument dataset. This is a work in progress but has shown promising results so far.
721
+
722
+ ## Training procedure
723
+
724
+ Spectrum top 35% finetune for both pretrain and SFT. Thanks to the cognitive computations team for the work done with spectrum.
725
+
726
+ + Pretraining methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
727
+ + Instruct finetune largely based on OpenHermes-2.5 and hermes-function-calling
728
+
729
+ ### Training hyperparameters
730
+
731
+ The following hyperparameters were used during training:
732
+ - learning_rate: 0.0002
733
+ - train_batch_size: 1
734
+ - eval_batch_size: 1
735
+ - seed: 42
736
+ - distributed_type: multi-GPU
737
+ - num_devices: 2
738
+ - gradient_accumulation_steps: 8
739
+ - total_train_batch_size: 16
740
+ - total_eval_batch_size: 2
741
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
742
+ - lr_scheduler_type: cosine
743
+ - lr_scheduler_warmup_steps: 1000
744
+ - num_epochs: 3
745
+
746
+ ### Training results
747
+
748
+ | Training Loss | Epoch | Step | Validation Loss |
749
+ |:-------------:|:------:|:----:|:---------------:|
750
+ | 1.3754 | 0.0005 | 1 | 1.7429 |
751
+ | 1.0 | 0.5002 | 1017 | 0.8864 |
752
+ | 0.9482 | 1.0005 | 2034 | 0.8395 |
753
+ | 0.6817 | 1.4987 | 3051 | 0.8063 |
754
+ | 0.697 | 1.9991 | 4068 | 0.7580 |
755
+ | 0.3769 | 2.4966 | 5085 | 0.8140 |
756
+ | 0.4278 | 2.9965 | 6102 | 0.8067 |
757
+
758
+
759
+ ### Framework versions
760
+
761
+ - Transformers 4.45.0
762
+ - Pytorch 2.3.1+cu121
763
+ - Datasets 2.21.0
764
+ - Tokenizers 0.20.0
magistrate-3.2-3b-it.Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc2ca10615861cabfb301b25a0372eea44ce951f23cd91b1e35c305f945f649a
3
+ size 1358732768
magistrate-3.2-3b-it.bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:289229951627ef2280b68229140f472cc707262606aad6a313cb0b9859f13b7a
3
+ size 6428492320